MASSACHUSETTS INSTITUTE OF TECHNOLOGY 
ARTIFICIAL INTELLIGENCE LABORATORY 



A.I. Memo No. 1183 December 1989 

Consensus Knowledge Acquisition 



Andrew Trice 
Randall Davis 



Abstract 

We have developed a method and prototype program for assisting two 
experts in their attempts to construct a single, consensus knowledge 
base. We show that consensus building can be effectively facilitated by 
a debugging approach that identifies, explains, and resolves discrepan- 
cies in their knowledge. To implement this approach we identify and 
use recognition and repair procedures for a variety of discrepancies. Ex- 
amples of this knowledge are illustrated with sample transcripts from 
CARTER, a system for reconciling two rule-based systems. Implications 
for resolving other kinds of knowledge representations are also exam- 
ined. 
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1 INTRODUCTION 

There is a curious contradiction in the current state of practice of knowledge acqui- 
sition: At a time when the view is widely shared that knowledge in organizations is 
distributed among multiple experts, and information systems are seen as an effective 
way to coordinate the activities of groups, common practice in knowledge acquisi- 
tion still focuses on acquiring the knowledge of a single individual. Research in both 
artificial intelligence (Davis, 1982; Mittal and Dym, 1985) and information systems 
(Fellers, 1987; Fedorowicz and Manheim, 1986; Mumford, 1987) has identified this 
gap as a major barrier to the development of more powerful knowledge systems. 

Until now, expert system developers have dealt with this difficulty either by re- 
fraining from building multi-expert systems entirely; by appointing one of the experts 
as "knowledge czar," thereby giving him the final word in any dispute; or merely by 
requiring experts to achieve consensus on their own, without any systematic assis- 
tance. Multi-expert acquisition techniques that have been proposed to date have 
tended to be either very restrictive mathematical formulations (Gaglio et al., 1985), 
adaptations of established group decision-making techniques (Jagannathan and El- 
maghraby, 1985), or methods that focus on simply using knowledge from multiple 
sources rather than finding and resolving the conflicts and inconsistencies in that 
knowledge. 

We call a process by which multiple experts attempt to construct a single consen- 
sus knowledge base "consensus knowledge acquisition" (CKA). The objective of our 
research is to develop ideas and tools to facilitate this activity. Specifically, we have 
drawn on and extended work in artificial intelligence, information systems design, and 
negotiation, to create a debugging system capable of aiding two (or more) experts in 
systematically identifying, explaining, and resolving discrepancies in their knowledge. 

We begin discussion of the issues by outlining several approaches to acquiring 
and using multiple bodies of expertise. We then argue for an approach focused on 
debugging and present a set of ideas in this vein. We describe the mechanisms we have 
developed for detecting and reconciling knowledge base discrepancies, illustrating 
these procedures with sample transcripts from our prototype system. Finally, we 
calibrate the contribution of our work and suggest promising future directions. 

2 HOW CAN WE HANDLE MULTIPLE EXPERTS? 

The problem of reconciling multiple points of view has been an issue of study for some 
time in areas as widespread as group decision making, mathematical psychology, and 
management science. One interesting way to view these disparate approaches is to 
categorize them according to whether they are descriptive or normative, and where 
they focus their efforts at consensus: on outcome, process, or knowledge. 
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2.1 Descriptive Approaches 

Descriptive approaches to this problem are fundamentally concerned with under- 
standing how groups of decision makers actually behave when required to produce 
a single answer. Behavior has been studied in both field settings (e.g. Janis, 1982) 
and various controlled laboratory conditions (e.g. Davis, 1980; Hammond, 1975). A 
commonly observed phenomenon is the existence of psychological barriers to effective 
decision- making: factors such as conformity pressure, shyness, unequal distribution 
of power, and others can all affect both the process of coming to a decision and the 
quality of the decision that results. 

2.2 Outcome Combination Methods 

Work aimed at combining outcomes is illustrated by ideas like voting (Miner, 1984), 
averaging (Aczel and Saaty, 1983), and decomposition and re-synthesis (Brehmer and 
Hagafors, 1986). The objective is to arrive at a decision which, while not necessarily 
reflecting a consensus of the experts, is still better than any single expert could have 
arrived at alone. 

These methods are largely normative — concentrating on how judgments ought 
to be combined rather than on what typically happens in groups, and are focused on 
outcome — it is the experts' final recommendations that are combined. 

The effectiveness of these methods depends on the validity of their assumptions 
about both the nature of the outcome and the skill mix of the experts. Nature of 
the outcome matters because, for example, voting is appropriate when the scale of 
outcome values is nominal, while averaging is suitable when it is a ratio. Assumptions 
about skill mix are crucial because averaging makes no sense unless expert errors are 
distributed randomly, while decomposition and re-synthesis assumes that they vary 
systematically across subproblems (i.e., experts have different sub-specialties). 

The fundamental problem with these methods is their focus on outcome rather 
on than the reasoning used to determine it. We believe it is premature to combine 
results before even attempting to achieve consensus on the underlying knowledge 
used to arrive at those results. Exploring that knowledge may reveal key differences 
in reasoning, vocabulary, or problem assumptions which, once reconciled, remove the 
outcome discrepancy entirely. There are also ownership issues to consider: If we 
combine results without allowing discussion of the underlying rationale, the experts 
are more likely to be unhappy with or unwilling to take responsibility for the result. 

These methods may prove useful in cases where experts have discussed the ra- 
tionales and still cannot reach agreement, or in situations where the knowledge bases 
exist but the experts responsible for them are unavailable. 

2.3 Argumentation 

A second approach, argumentation methods, centers on helping people make ex- 
plicit the logical structure of their positions. Structured frameworks for analyzing 
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arguments (Toulmin, 1958; Fogelin, 1982), for instance, enable different parties in a 
debate to cooperate in constructing and making precise the arguments for and against 
a particular assertion. These ideas have recently been embodied in computer-based 
tools (e.g., Smolensky et al., 1987; Stefil et al., 1987; Lowe, 1985; Nunnamaker et al., 
1988) that aid users in constructing and manipulating the arguments, and sometimes 
offer spreadsheet-like capabilities that facilitate exploring the impact of changing an 
assumption. 

These tools are normative in their approach to consensus building and almost 
entirely process oriented: they assist experts in the process of deliberating and de- 
bating, but, importantly, do not suggest resolutions to inconsistencies. As such they 
introduce an element of rigor into the deliberation process, but offer little guidance 
in resolving differences between the experts. 

2.4 Debugging the Knowledge 

We do not want to focus on outcome alone, because we believe that the fundamental 
task is to reach consensus on the knowledge itself: differences in outcome may simply 
be symptoms of a disagreement about what to know. In that case dealing with 
outcome is treating the symptoms rather than the cause, while dealing with the 
differences in knowledge solves the root problem and may eliminate all the symptoms. 

We choose not to focus on formal argumentation in the belief that the knowledge 
representation in use — in this case rules — provides sufficient basic structure to the 
discussion. 

Instead we seek to assist the experts in detecting, deliberating over and recon- 
ciling discrepancies between them. Our approach is normative and focused on the 
underlying knowledge used by each expert: we want to understand how experts ought 
to come to agreement and we want that agreement to be about the thing we consider 
to be fundamental to this undertaking — the knowledge used to make the decisions. 
Debugging is a technique well suited to our goals because it centers on the detec- 
tion, explanation, and repair of defects in symbolic systems. As a result we use the 
phrase "debugging" the knowledge to characterize both the focus of our efforts and 
the primary technique we employ. 

3 SOME USEFUL IDEAS 

Given this perspective, three research areas provide relevant concepts. Artificial intel- 
ligence (Al) offers the literature on knowledge-based systems and a body of work on 
debugging; information systems provides general guidelines for synthesizing multiple 
points of view; while work in negotiation and conflict resolution suggests the role of 
a third party facilitator. 

From Al we exploit the notion that the knowledge representation in use can assist 
consensus-building by providing a structure and vocabulary for comparing arguments 
and the knowledge on which they are based. One familiar example is the explanation 
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facility provided by rule-based systems (Davis et al., 1977). These allow a user to 
trace the steps the program followed in reaching a particular conclusion, providing 
a representation of the argument a domain expert would put forward in support of 
his recommendation. This provides a concrete and specific focus to the discussion. 
The differences between two such reasoning chains can then be described using the 
vocabulary provided by the representation, in this case the notion of if/then rules, 
attribute-object-value triples, strengths of certainty, etc. This helps to establish the 
agenda for discussion between the two experts. 

Program debugging research takes this idea a step further. Many debugging 
systems (Brown and Burton, 1978, Kuper, 1989) have developed bug taxonomies 
that specify the kinds of things that can go wrong, the probable causes underlying 
them, and the corresponding repairs. A key idea here is that knowledge about the 
program being debugged can itself be used to help guide the repair. Davis (1979), 
for example, used knowledge about knowledge base structure to support individual 
knowledge acquisition. Our research can be viewed as the extension of this work to 
the multiple expert case. 

From information systems design we adapt methodologies used to resolve con- 
flicting points of view (e.g., Murnford, 1987; Mason and Mitroff, 1981; Hammond et 
al., 1984). These methodologies advocate, first, full and active participation from all 
involved parties. This suggests that we should structure the CKA process so that 
the two experts are likely to have equal influence on design decisions. Second, both 
adversarial and conciliatory activities are needed to maximize the validity of the fi- 
nal design (Henderson, 1987). This implies that we require tools both for enabling 
experts to understand how they differ and for suggesting ways to resolve their con- 
flicts. Third, it is more effective to focus expert discussion on decision criteria rather 
than on outcomes (Hammond et al., 1984). This has helped encourage our focus 
on knowledge rather than process. Finally, the resulting consensus system must be 
based on a foundation of commonly understood terms, because agreement on the 
higher-level behavior of the system critically depends on this mutual understanding: 
If the basic vocabulary differs, the two participants are speaking different, possibly 
incommensurate languages. 

From negotiation, we use the metaphor of the third party mediator. A program 
for facilitating CKA can be thought of as a facilitator whose job is to aid in resolving 
discrepancies between two experts. Although CKA is somewhat different from a 
traditional negotiation situation, there is still a useful resemblance. First, negotiation 
gives us a vocabulary for characterizing the range of roles a CKA program attempts 
to fill (e.g., "non-binding arbitrator", "process consultant"). Second, it can help us 
understand the probable consequences of various discrepancy resolution strategies. 
For instance, if mediators attempt to resolve easy issues before hard ones, they may 
create a cooperative climate between the parties, but risk alienating parties who view 
discussing trivial issues as a waste of time (Rubin, 1981). 
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4 BOUNDING THE PROBLEM 

We make several assumptions to help bound the task we take on here. First, we 
assume the expertise to be reconciled is homogeneous in the sense that both experts 
are capable of solving the entire problem. This enables us to focus on resolving 
discrepancies rather than combining knowledge from distinct fields. 

A second, related assumption of our approach is that the experts already have a 
shared frame of reference, some basic set of assumptions in common. Without that, 
determining where they agree and disagree would be difficult, not only for our system, 
but for any human attempting the task. 

Third, we assume the experts have constructed individual knowledge bases (KBs) 
prior to the start of the process. This ensures that the experts can explain the rea- 
soning they used to arrive at their answers and that that reasoning can be adequately 
captured by a known reasoning process. This in turn allows us to focus on debug- 
ging the knowledge — detecting and resolving differences — rather than knowledge 
acquisition. 

Fourth, experts involved in CKA are assumed to have equal influence on the 
process. The intent is that the content of the consensus KB be determined by rational 
deliberation rather than political or organizational factors. A related assumption 
is that any conflict between the experts arises from disagreements about facts and 
judgments rather than from conflicting interests, as in a bargaining situation. 

Finally, as simplifying assumptions at the outset we consider only rule-based 
representations of knowledge, and only two experts, as a way of providing a foundation 
for our initial efforts. 

Two other points will help to set the context for our work. First, it is a fundamen- 
tal premise of the work that a consensus KB can perform better than an individual 
expert's KB. Our hypothesis is that unearthing and resolving differences between two 
experts will be fundamentally synergistic, removing limitations and defects in both of 
their KBs. This is plausible but of course not guaranteed: some consensus knowledge 
bases may not be as good as either of the originals. 

Second, our point of view is normative rather than descriptive, unlike much of 
the work in group decision making, which attempts to describe the complex set of 
psychological phenomena that occur in such settings (e.g., Janis, 1982). Rather than 
asking what does happen when groups of experts interact, we ask how two experts 
should behave to maximize the benefit from collaboration. This is illustrated in 
part by our assumption above that the multiple experts have equal influence on the 
process. As with any normative group decision making process, we look for ways 
of proceeding that attenuate the psychological barriers. We believe that focusing 
discussion on repairing specific discrepancies in knowledge is one useful mechanism 
for achieving this. 
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We are developing a prototype system for facilitating CKA, dubbed CARTER (Conflict 
AnalyzeR for Targeted Expert Resolution). The system plays the role of a non- 
binding arbitrator mediating between two experts (Figure 1). 

CARTER examines each expert's KB, looking for matches and conflicts between 
them, deciding which discrepancy to try to resolve, and suggesting possible resolu- 
tions. The two experts discuss the suggested resolution and can choose to update 
their KBs as suggested, update them in some other manner, or not update them 
at all. Whatever the decision, the agreed-upon knowledge is added to the third, 
consensus knowledge base. The experts' KBs are then analyzed anew, with the cycle 
repeating until those two agree exactly, or no further areas of consensus can be found. 
In practice, the process is slightly more complex than this, but this gives a sense of 
the basic structure. We use transcripts of CARTER in operation to illustrate some of 
our discrepancy resolution techniques. 



identify discrepancies 
suggest resolutions 




confer 




CARTER 



KB 2 




KB 1 



update KB's 



FIGURE 1. 
CARTER Scenario 
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5.1 USING DISCREPANCY KNOWLEDGE 

Our initial efforts at CKA focus on information derived from the KBs themselves: 
CARTER examines the two KBs to detect discrepancies and to determine how they 
might be made consistent. The KBs consist of rules expressed in terms of object, 
attribute, and value triples that supply a topology of relationships between the con- 
cepts. CARTER'S knowledge lies in detecting specific kinds of discrepancies and linking 
them with one or more potential resolutions. 

As an example, imagine that two wine experts, Kevin and Mary, have each con- 
structed a KB that recommends a specific wine to go with dinner, and now wish to 
create a single, consensus KB. Among the discrepancies they might encounter are: 

1. differences in the nature of the outcome: one expert may specify a wine grape 
(e.g., Pinot-Blanc) while the other specifies both grape and vintage (e.g., Pinot- 
Blanc '83). 

2. differences in vocabularies: one expert may refer to the body of the wine, while 
the other refers to its robustness. 

3. differences in pattern of inference: the experts may agree on the overall vocab- 
ulary, but interconnect them differently, as for instance if one expert uses the 
character of the meal (spicy or bland) to help infer which wine to select, while 
the other relies on the category of the main dish (e.g., meat or fish). 

4. differences in the rules: the experts may agree on the vocabulary and intercon- 
nection between terms, but suggest different specific values, as for instance if 
one expert reasons that a turkey dish suggests a white wine, while the other 
reasons that a turkey dish suggests a rose wine. Both are reasoning from the 
type of the main dish to the color of wine, but come out with different values. 

CARTER'S overall strategy is to attack these in the order given. This approach 
is motivated by both the computational and negotiation character of the task. The 
computational task faced by the system is one of matching two collections of rules 
that are at one level simply directed graphs; any useful guidance about where to start 
the matching process will vastly improve the system's chances of making intelligent 
suggestions. Expressed in these terms, we anchor the search at the end of the graph, 
trying first to match the outcomes, then working backwards, matching the nodes 
connected to the outcome, and continuing to work backward from there. Starting 
with the outcome is sensible because it relies on the heuristic that two KBs about the 
same topic are likely to have the same attribute as their goal 

Starting with the outcome is also sensible from the negotiation point of view: it 
is difficult to imagine an effective discussion about the details if the two knowledge 
bases are trying to arrive at different kinds of conclusions. 

Figure 2 shows the beginning of this process. CARTER starts by determining the 
goal of each KB, a simple task since it is by definition the sole attribute that appears 



5 CARTER 8 

only in the conclusions of rules (i.e., it is inferred by rules but nothing further is 
inferred from it). CARTER identifies Kevin's goal as wine-region and Mary's as 
wine-name. 



Expert 1 , what is your name? KEVIN 
Expert 2, what is your name? MARY 

KEVIN and MARY, the first thing I want to do is get some basic 
agreement on what the goal of the consensus KB should be. I am 
analyzing your individual KB's in an attempt to match them up. 

OK. Here are the results of my analysis. 



KEVIN has goal WINE-REGION 
MARY has goal WINE-NAME 

Figure 2: Identifying goals. 



The system's next task is to decide as best it can whether these two things 
represent identical concepts. The judgment about the real meaning of these two 
terms can only come from the experts, but the system can make a surprisingly good 
guess by examining three kinds of circumstantial evidence available in the knowledge 
base: 

• Are the concept labels the same? In this case they are not (wine-region vs. 
wine-name), but this can of course be an artifact of name-choice or (in other 
circumstances) variations in spelling or abbreviation. Conversely, a match in 
labels is useful evidence but no guarantee of match in meaning. 

• In the case of attributes, are the values the same? Once again here the answer 
is no (e.g., California, rhone, etc., vs. chablis, gamay etc.). 

• Are they inferred from the same concepts and are they in turn used to infer the 
same concepts? That is, do the occupy similar places in the topology of the 
knowledge base? Once again in this case the answer is no. 

Note that the last form of evidence makes the process recursive: to determine 
whether two concepts in the conclusion of a rule are the same, we need to determine 
whether the concepts mentioned in the premise are the same, thereby starting the 
process all over again with the premise concepts. 
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Weighing the evidence in the case at hand (Figure 3), CARTER concludes that 
wine-region and wine-name are not identical concepts. In response it tries a differ- 
ent tactic, invoking the heuristic that the goal attribute of one KB might be found 
on the route to the goal attribute of the other KB. That is, since the two endpoints 
in the graphs do not match, perhaps the endpoint in one matches with one of the 
intermediate points (conclusions) in the other. To explore this possibility, CARTER 
tries to match wine-region of Kevin's KB (KKB) with the attributes that deter- 
mine wine-name of Mary's KB (MKB), using the same criteria of label, value, and 
topological correspondence. 

As it turns out, this too fails, so CARTER tries it the other way around, attempting 
to match wine-name of MKB with the attributes determining wine-region in KKB. 
One of these attributes, wine-grape, happens to match rather well with wine-name. 
Although their names are not the same, they share eight different values, along with 
three attributes used to determine them. This is strong evidence that wine-grape 
and wine-name are the same concepts. 



I'm afraid these goals do not match. 

Also, KEVIN'S goal doesn't match up with any concepts in MARY's KB. 

However, MARY'S goal does seem to correspond to an attribute 
determining KEVIN'S goal, in particular, WINE-GRAPE, because 
The domain of WINE-GRAPE of KEVIN'S KB and 
the domain of WINE-NAME of MARY's KB match up: 
Values in common are: CHABLIS CHARDONNAY CHENIN-BLANC RIESLING 

GAMAY PINOT-NOIR ZINFANDEL SAUVIGNON-BLANC 
Values only KEVIN has are: PINOT-GRIS SEMILLION NAPA-GAMAY 

PINOT-BLANC 
Values only MARY has are: BURGUNDY VALPOLICELLA SOAVE. 

In addition, 

WINE-COLOR, WINE-SWEETNESS, and WINE-BODY of KEVIN'S KB, which 

determine WINE-GRAPE, match with 
WINE-RECOMMENDEDCOLOR, WINE-RECOMMENDEDSWEETNESS , and 
WINE-RECOMMENDEDBODY of MARY's KB, which determine WINE-NAME. 

Therefore, it would seem that WINE-REGION is the result of an extra 
operation on the data KEVIN'S KB performs that MARY's KB does not. 

Figure 3: Matching goals. 
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The strong match between wine-grape and wine-name enables CARTER to pro- 
pose a specific diagnosis about the specific discrepancy between the two KBs. Since 
(Figures 3 and 4) wine-name, the goal attribute of MKB, seems to match the concept 
wine-grape, a concept on the route to the goal of KKB, CARTER concludes that 
wine-region is a concept that reflects an additional inference that only Kevin's KB 
performs, thereby providing a more specific recommendation (not just a wine grape, 
a grape grown in a particular region). 

Mary's Knowledge Base: WINE-NAME 

Kevin's Knowledge Base: WINE-GRAPE =>• WINE-REGION 

Figure 4: Aligning the two KBs. 



Now that CARTER has analyzed the discrepancy, it can propose a plan to reconcile 
it (Figure 5). One of the repair actions suggests that, when one KB has an attribute 
the other lacks, the discrepancy can be resolved either by adding the attribute to one 
KB or deleting it from the other. That is, the experts need to decide whether the 
consensus KB should use the extra attribute. 



I recommend that you do the following to reach agreement on the 
goals of the consensus KB. 

First, decide whether WINE-REGION should be included in the 
consensus KB. 

If so, MARY adds the WINE-REGION attribute to her KB and generates 
anew set of rules for determining WINE-REGION. 

If not, KEVIN deletes the WINE-REGION attribute from his KB, and we 
consolidate the attributes WINE-GRAPE and WINE-NAME. 

Do you approve of this plan? Y 

Figure 5: The repair plan. 



At this point, CARTER asks the experts to decide. Whatever the outcome, there 
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is additional work to do. Since wine-name and wine-grape are the same underly- 
ing concept, they must be reconciled. In addition, if the experts decide to include 
wine-region in the consensus KB, Mary must provide her version of the rules that 
take the additional inference step, determining wine-region from wine-name, then 
those rules must be compared with Kevin's and any discrepancy resolved. 

The experts indicate this plan as reasonable. CARTER then provides information 
about wine-region to assist them (Figure 6), and after some discussion, the experts 
choose not to include it. It is deleted from KKB. 



Here is some information on WINE-REGION: 

Attribute name: WINE-REGION 

KB's found in: KEVIN'S 

Possible values: CALIFORNIA RHONE RED-BURGUNDY ST-EMILLION BEAUJOLAIS 

RED-BURGUNDY WHITE-BURGUNDY COTES -DU-RHONE 
Objects attribute affects: None 
Attributes determined by: WINE-GRAPE WINE-COLOR 

Now, given this information, do you wish to retain WINE-REGION for 
the consensus KB? (Take all the time you want to discuss it) N 

OK. I'll delete WINE-REGION from KEVIN'S KB now, OK? Y 

WINE-REGION deleted. 

Figure 6: Resolving the discrepancy 



As indicated by the plan CARTER supplied, the next task is to consolidate wine-grape 
and wine-name. Once again CARTER knows about this kind of discrepancy and how 
to resolve it: the experts should choose a shared attribute name and agree on a con- 
solidated list of values for the attribute (Figure 7). The experts choose wine-type as 
the shared term. 
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Next, I am going to try to help you consolidate WINE-GRAPE and 
WINE-NAME. 

Given how they match up, are WINE-GRAPE and WINE-NAME the same 
concept? Y 

What do you two want to call this concept? WINE-TYPE 

Fine. I will change the names. 

Figure 7: Consolidating two concepts. 



Reconciling the values is more involved, since we are not sure how the mismatched 
values correspond. CARTER tries to solve the problem by using the topology of the 
KBs to attempt to match the values. As with attribute labels, if two values with 
different labels are linked to other values that do correspond (i.e., they have the same 
label), the different labels may in fact be referring to the same underlying concept. 
The heuristic here is that rules with similar premises typically have similar conclusions 
and that the apparent difference in conclusions can arise simply because the experts 
are using different names (or varying spelling) for the same concept. 

To test this theory, CARTER retrieves the rules that determine wine-type in both 
KBs and picks out those with identical premise value labels but differing conclusions 
(Figure 8). This process picks out pinot-gris in KKB and soave in MKB, because 
they are both inferred by rules that reason from a wine whose sweetness is dry, and 
whose color is white. 
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Now, let's establish the consensus values for WINE-TYPE. 
So far, we have the following set of consensus values: 
CHABLIS CHARDONNAY CHENIN-BLANC RIESLING GAMAY PINOT-NOIR ZINFANDEL 
SAUVIGNON-BLANC 

Values still to be accounted for are 

PINOT-GRIS SEMILLION NAPA-GAMAY PINOT-BLANC in KEVIN'S KB and 

BURGUNDY VALPOLICELLA SOAVE in MARY's KB. 

Some of these values may be corresponding concepts with different 
labels; others may needed to be added to or deleted from the 
consensus KB. 

Would you like some suggestions about how the labels might 
correspond? Y 

PINOT-GRIS may correspond to SOAVE since they both appear in similar 
rules: 

KEVIN'S KB MARY'S KB 

IF WINE-COLOR is WHITE and IF WINE-RECOMMENDEDCOLOR is WHITE and 

WINE-SWEETNESS is DRY WlNE-RECOMMENDEDSWEETNESS is DRY 

THEN WINE-TYPE is PINOT-GRIS THEN WINE-TYPE is SOAVE 

Are PINOT-GRIS and SOAVE two different names for the same concept? N 
Ok. 

I have no further suggestions for how any of the unmatched values 

might correspond. 

The possible correspondences are: 

1) different names for the same concept 

2) different concepts that should be added to the consensus KB 

3) different concepts that should be deleted from the consensus KB. 

Please indicate which category each value belongs in, and I'll help you 
make the necessary changes . 



Figure 8: Attempting to match values. 
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As it turns out, in this case the strategy fails, since pinot-gris and soave are in 
fact different types of wine. At this point CARTER has run out of useful suggestions 
about how to pair off the unmatched values in each knowledge base. Some of the 
other values may correspond, but CARTER cannot detect them because the experts 
have named them differently and used different premises to conclude them. The 
remaining non-shared values are simply missing from one of the knowledge bases. 
The system now offers one final piece of assistance, informing the experts of the kinds 
of discrepancies that can occur (accidental naming mismatches and genuine differences 
in the coverage of each KB), and asks them to determine the correct analysis for each 
of the non- shared values. 

After reaching agreement on the goal for the consensus knowledge base, the sys- 
tem then retrieves the attributes that determine the goal in each of the individual 
KBs and uses the techniques just illustrated to get agreement on these attributes. 
Once the system has agreement on these related attributes, it begins to detect and 
remove incompleteness and inconsistency in the rules that link them. Figure 9 shows 
an example of resolving two inconsistent rules: two rules with identical premises but 
different conclusions. 



Now, let's try to get agreement on the rules determining WINE-COLOR. 

I have noticed the following discrepancy: 

KEVIN'S KB MARY'S KB 

IF ENTREE-NAME is TURKEY IF ENTREE-NAME is TURKEY 

THEN WINE-COLOR is WHITE THEN WINE-COLOR is ROSE 

Since you two have already discussed the vocabulary involved here, I 
am pretty sure this problem is not due to a failure to consolidate 
WHITE and ROSE, or a misunderstanding about the meaning of 
ENTREE-NAME or WINE-COLOR. 

Would you like to include both of these rules in the consensus 
KB? N 

Figure 9: Detecting and analyzing inconsistent rules. 



CARTER knows three ways in which this can happen: 
1. a misunderstanding about the vocabulary: white and rose could be synonyms. 



5 CARTER 15 

This is ruled out because the experts have already agreed on the vocabulary. 

2. there is not really a mismatch because both rules should be in both knowledge 
bases (each expert forgot one rule that the other remembered). 

CARTER offers them this option, but they decline. 

3. both rules are over-generalized as stated: they are both missing an attribute 
whose value constitutes an important unstated assumption that the experts 
know but forgot to make explicit. 

When CARTER inquires about this last possibility (Figure 10), the experts agree 
that they omitted information about the entree-sauce. When they elaborate their 
rules to include this, we discover that they had different assumptions about the cor- 
responding value, so the rules are in fact mutually compatible and the experts agree 
that they should both be included in the consensus KB. The process continues with 
the system guiding the experts in resolving remaining details about the new attribute 
entree-sauce. 
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In that case, let me propose another solution. 

It is possible that you two are making different assumptions about 
some other attribute which is important for determining the 
conclusion of the rule, but that neither of you have included in 
your KB. Take a moment to think about and discuss what other 
factors might be important in determining WINE-COLOR. 

Can you think of any other factors? Y 

What do you want to call this factor? ENTREE-SAUCE 

OK. KEVIN, what value for ENTREE-SAUCE did you have in mind when 
you wrote your rule? NONE 

MARY, what value for ENTREE-SAUCE did you have in mind when you 
wrote your rule? SWEET 

If we included ENTREE-SAUCE in the two rules, they would look like 
this: 

KEVIN'S KB MARY'S KB 

IF ENTREE-NAME is TURKEY and IF ENTREE-NAME is TURKEY and 

ENTREE-SAUCE is NONE ENTREE-SAUCE is SWEET 

THEN WINE-COLOR is WHITE THEN WINE-COLOR is ROSE 

Are both of you willing to include both of these rules in the 
consensus KB? Y 

Great! Now, to complete the repair process, we have to determine 
the set of consensus values for ENTREE-SAUCE, and add it into the 
other rules linking ENTREE-NAME and ENTREE-SAUCE to WINE-COLOR as 
appropriate. Then, we can go to work on the discrepancies which 
arise in these modified rules. 

Figure 10: Resolving the difference. 
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5.2 CATALOGING DISCREPANCY KNOWLEDGE 

carter's expertise lies in recognizing and repairing a variety of discrepancies. This 
knowledge is organized in a catalog currently containing ten entries, each of which 
consists of a discrepancy detection procedure and a corresponding set of resolution 
procedures. This simple, detection-resolution style organization of the catalog makes 
it easier to add new entries as we gain more experience with consensus knowledge 
acquisition. 

To determine the kinds of discrepancies we needed to cover, we systematically 
compared a number of KBs at three different levels of abstraction. Viewing a knowl- 
edge base as a functional relationship leads us to focus on inputs (test data) and 
outputs (the generated recommendations). Viewing it in terms of individual rules 
centers on the detailed relationships between attribute-object-value triples. Those 
triples in turn define the vocabulary of the experts. Studying KBs from each of 
these three points of view gives us some assurance that we have achieved reasonable 
coverage of the set of possible discrepancies. 

We also found that discrepancies can be resolved through four general mecha- 
nisms that cut across these levels of abstraction: (i) negation, (ii) incorporation, (iii) 
compromise, and (iv) elaboration. Negation suggests that one expert should change 
something to remove a defect in his knowledge base, when the other expert convinces 
him that one of his judgments is incorrect. Incorporation suggests that one expert 
should add something to his KB that the other already has (or conversely that the 
other expert should remove it). This is useful when one KB has an incomplete set 
of objects, rules, or test cases (or the other KB has extraneous objects, rules, or test 
cases). Figure 6 provides an illustration with the deletion of wine-region. 

Compromise suggests that both experts change their KBs. It is helpful when the 
experts wish to establish a shared vocabulary or negotiate an intermediate settlement. 
One example is the decision to use wine-type as the shared name in Figure 7. 

Elaboration suggests that both experts add something to the KBs to remove 
discrepancies not otherwise resolvable. It is needed when a problem is not localizable 
in either KB individually, as when entree-sauce had to be added in Figure 10. 

The discrepancy catalog is a domain-independent source of knowledge that sys- 
temizes our approach to CKA. We can account for the differences between KBs in 
terms of the catalog entries and attempt to remove them through an associated resolu- 
tion mechanism. The result is a tool for partitioning the CKA problem and supporting 
the solution of each of the subproblems. 

6 RELATED WORK 

Two previous efforts are similar in general spirit to ours. A previous use of debugging 
in this general area is the Delphi technique (Helmer and Rescher, 1959; Jagannathan 
and Elmaghraby,1985), used to achieve consensus among a group of experts on a 
specific issue. It is a three-step, iterative process involving, (i) submitting individual 
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opinions and their supporting reasoning to a skilled facilitator, (ii) preparation of 
a summary report by the facilitator, and (iii) forwarding of the report back to the 
experts. Here the facilitator plays a role in debugging by attempting to clarify the 
specific areas of disagreement among the experts. Although entirely manual, this role 
demonstrates one possible activity of a CKA debugging program, that of "setting the 
agenda" for discussion between experts. 

More recently, work by Boose (1986) and Plaza et al. (1987) has been focused 
on using knowledge of multiple experts. They concentrate for the most part on a 
number of schemes for using the knowledge rather than resolving discrepancies. They 
suggest combining expertise simply by adding both experts' knowledge to a single 
knowledge base, tagging each rule with its author, and then allowing a number of 
basic strategies. In one case the user simply has to decide which expert to believe, 
in another the user can weight the experts' opinions, etc. The guidance they do offer 
in reaching consensus on the knowledge is relatively modest. They proceed from the 
repertory grid notion (Kelly, 1955) that underlies their work and suggest that all 
the vocabulary terms used by each expert individually to construct his own grid be 
combined to form a single, larger vocabulary that will then be used by each expert 
to construct a new grid. They acknowledge that the experts may be unfamiliar with 
each other's terms and "may have to 'guess' what was meant" when they encounter 
an unfamiliar term in the grid. 

Another recent study (Klein et al., 1989) addresses the issue of resolving con- 
flicting design specifications. Through direct observation of architects cooperatively 
developing a design for a house, they developed a conflict class hierarchy for identi- 
fying and resolving differences between design alternatives. Although this typology 
of conflicts is similar in some respects to our discrepancy catalog, one important dif- 
ference is in the content: their primary focus is on reconciling the designs themselves 
rather than design knowledge. 

7 CONTRIBUTIONS, EXTENSIONS, LIMITATIONS 

The primary contribution of this work is the store of detailed information we have 
codified for facilitating CKA. It represents a small but growing and relatively system- 
atic expression of knowledge that was previously informal, experiential, and largely 
tacit. 

A second contribution arises from the surprisingly effective degree of bootstrap- 
ping the system displays. The system must make its best guess about the meaning 
of a term from the way it is used in a knowledge base, it can gather only circum- 
stantial evidence of the sort we reviewed above, and it must, paradoxically, gather 
that evidence from the very same knowledge bases it is attempting to modify to reach 
consensus. It it thus striking how effective the system's heuristics are at guiding it, 
allowing it to make plausible judgments about which concepts match and so that even 
when it has to ask the experts, the questions are for the most part sensible and well 
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chosen. 

A third contribution is the role of our work as a general model for construct- 
ing systems that detect and resolve knowledge- level discrepancies. While our current 
system removes discrepancies in rules and attribute-object-value triples, we believe 
debugging and repair strategies can equally well be organized around other kinds of 
knowledge structures, including decision trees, frames, and database schemas. The 
fundamental process involves three steps: identify the various elements of the repre- 
sentation (e.g., alternatives, events, payoffs, probabilities), develop a taxonomy of how 
the representations can differ across these elements (e.g., one expert has an additional 
alternative), and finally prescribe possible resolutions for each of these discrepancies 
(e.g., one expert has to add an alternative). The three resolution mechanisms de- 
scribed above may provide additional guidance in this last step. 

The fourth contribution is the ability of the debugging approach to support 
the early phase of CKA. Recall that the other techniques for reconciling multiple 
experts — combination and argumentation — are most effective only after we have es- 
tablished that a conflict exists and that it is difficult or impossible to resolve. Our 
technique is useful in the important previous stage when we are still trying to make 
sense of how the KBs compare. It would be unwise for experts to argue about their 
differing positions before they had established that a real conflict existed. The size 
of the discrepancy catalog suggests that it is surprising how many inconsistencies are 
reconcilable without resorting to argument or outcome combination methods. 

7.1 Future Work 

Although the discrepancy catalog is an important and effective first step, considerable 
work remains. One of the most important areas for future research is the question of 
discrepancy resolution strategy. While the strategy discussed in Section 5.1 (starting 
at the outcome and working backward) is very useful, it is only one of many possi- 
bilities. One problem is that this may be a bit too myopic to be effective in a large 
scale knowledge base. The system in effect immediately dives into the details and its 
needs a better sense of the larger picture. Our next task is thus to generate a number 
of strategies and evaluate them in terms of (i) the efficiency and effectiveness with 
which they increase the degree of consensus, and (ii) the naturalness and coherence 
of the dialogues they produce. 

We will investigate strategies organized around two kinds of approaches. The first 
approach relies on systematic traversal of the KB. One example of this was illustrated 
earlier (working backward from the goal); we intend to examine two others that are 
also likely to be effective: forward from inputs and working in both directions from 
any agreed on point. We expect that working forward from inputs should be effective 
on the grounds that the two KBs are likely to work from the same basic information. 
We believe that begining at an intermediate point of agreement and expanding in 
both directions will exploit the strategy of emphasizing what the experts already 
agree on and building from this foundation. 
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The other approach involves assigning a score to each discrepancy based on its 
severity (how dissimilar the two concepts are), the value of resolving it (how much 
resolution would increase the degree of consensus), and the likelihood that it will 
not be resolved through the removal of another discrepancy (e.g., when the resolu- 
tion of an attribute discrepancy reconciles their corresponding rules). The system's 
choice of discrepancy could then be guided by a hill-climbing strategy, always choos- 
ing the discrepancy with the highest score. Another form of guidance can be supplied 
by precedence relationships between both knowledge base elements (e.g., attributes 
should be resolved before rules) and resolution mechanisms (e.g., attempt incorpo- 
ration before elaboration). Thus far we have implemented scores based on severity, 
determining which attributes to match up next. 

Several other extensions to the system may also be desirable. First, additional 
knowledge, not available from the structure of the KBs themselves, will likely provide 
the system with additional power. Clancey (1986), for example, notes that many 
current rule-based systems employ a problem solving technique called structured se- 
lection, characterized by abstracting from specific data (e.g., classifying a patient 
based on patient data), followed by heuristic matching (associating a patient class 
to a disease), and then solution refinement (refining from disease category to spe- 
cific disease). Each of these three subtasks is carried out by different sets of rules. 
If CARTER could determine which rules belonged to each subtask, it could use this 
knowledge to characterize discrepancies more precisely and organize its presentation 
of choices to the expert. It would as a result be using knowledge about the character 
of the task (structured selection), in addition to its existing knowledge about rules, 
attribute-object-value triples, etc. 

Second, for the cases in which debugging alone fails to result in a consensus KB, it 
would be helpful to give CARTER the ability to support formal argumentation between 
the experts or suggest resolutions based on combination methods (e.g., averaging 
certainty factors). Finally, we might streamline the resolution process, as in the 
instances in which the incrementalism of the debugging approach is inefficient. For 
example, the system may prescribe a number of isolated modifications to the KBs 
when it would be easier simply to redo an entire section all at once. It would be nice 
to be able to recognize such situations. 

The bootstrapping nature of the system has substantial implications for its perfor- 
mance. In general, the more any bootstrapping program knows, the more effectively 
it can perform, and conversely. CARTER will perform well when a large number of 
similarities exist from which to gain a foothold, but will degrade significantly when 
few are found. Seemingly trivial differences like different abbreviations in the labels 
used for values can make matching very difficult. 

Our attempt to discern meaning of terms by bootstrapping from the existing 
knowledge base can also run into trouble in circumstances that are unusual, but not 
impossible. The question of whether two concepts mean the same thing is in fact deep 
and in general extremely difficult to answer with assurance. Even the best human 
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mediator working with two cooperative experts may find out only after considerable 
time has elapsed that two terms thought to be synonymous in fact had importantly 
different shades of meaning. The best we can do here is accumulate all available 
circumstantial evidence and use it in the most effective order (comparing names, 
values, and topology, then eventually asking the experts). In doing so we reduce the 
chance of being misled, but must remain aware of the possibility of it happening. 

8 CONCLUSION 

We have described a novel approach to and prototype system for facilitating consensus 
knowledge acquisition. The key contributions of this work include the development of 
a detailed store of knowledge for detecting and resolving discrepancies in rule-based 
systems and a general procedure for developing similar systems for other representa- 
tions. We expect the next advance in this area to come from implementing improved 
discrepancy resolution strategies. This work will serve as the starting point for under- 
standing more generally how experts reach consensus and how we can best support 
them in their efforts to do so. 
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